80 research outputs found

    SVD-DIP: Overcoming the Overfitting Problem in DIP-based CT Reconstruction

    Full text link
    The deep image prior (DIP) is a well-established unsupervised deep learning method for image reconstruction; yet it is far from being flawless. The DIP overfits to noise if not early stopped, or optimized via a regularized objective. We build on the regularized fine-tuning of a pretrained DIP, by adopting a novel strategy that restricts the learning to the adaptation of singular values. The proposed SVD-DIP uses ad hoc convolutional layers whose pretrained parameters are decomposed via the singular value decomposition. Optimizing the DIP then solely consists in the fine-tuning of the singular values, while keeping the left and right singular vectors fixed. We thoroughly validate the proposed method on real-measured μ\muCT data of a lotus root as well as two medical datasets (LoDoPaB and Mayo). We report significantly improved stability of the DIP optimization, by overcoming the overfitting to noise

    Acoustic Event Detection and Localization with Regression Forests

    Get PDF
    This paper proposes an approach for the efficient automatic joint detection and localization of single-channel acoustic events using random forest regression. The audio signals are decomposed into multiple densely overlapping {\em superframes} annotated with event class labels and their displacements to the temporal starting and ending points of the events. Using the displacement information, a multivariate random forest regression model is learned for each event category to map each superframe to continuous estimates of onset and offset locations of the events. In addition, two classifiers are trained using random forest classification to classify superframes of background and different event categories. On testing, based on the detection of category-specific superframes using the classifiers, the learned regressor provides the estimates of onset and offset locations in time of the corresponding event. While posing event detection and localization as a regression problem is novel, the quantitative evaluation on ITC-Irst database of highly variable acoustic events shows the efficiency and potential of the proposed approach

    Early Event Detection in Audio Streams

    Get PDF
    Audio event detection has been an active field of research in recent years. However, most of the proposed methods, if not all, analyze and detect complete events and little attention has been paid for early detection. In this paper, we present a system which enables early audio event detection in continuous audio recordings in which an event can be reliably recognized when only a partial duration is observed. Our evaluation on the ITC-Irst database, one of the standard database of the CLEAR 2006 evaluation, shows that: on one hand, the proposed system outperforms the best baseline system by 16% and 8% in terms of detection error rate and detection accuracy respectively; on the other hand, even partial events are enough to achieve the performance that is obtainable when the whole events are observed

    Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks

    Get PDF
    We present in this paper a simple, yet efficient convolutional neural network (CNN) architecture for robust audio event recognition. Opposing to deep CNN architectures with multiple convolutional and pooling layers topped up with multiple fully connected layers, the proposed network consists of only three layers: convolutional, pooling, and softmax layer. Two further features distinguish it from the deep architectures that have been proposed for the task: varying-size convolutional filters at the convolutional layer and 1-max pooling scheme at the pooling layer. In intuition, the network tends to select the most discriminative features from the whole audio signals for recognition. Our proposed CNN not only shows state-of-the-art performance on the standard task of robust audio event recognition but also outperforms other deep architectures up to 4.5% in terms of recognition accuracy, which is equivalent to 76.3% relative error reduction

    Early Prediction of Future Hand Movements Using sEMG Data

    Get PDF
    We study in this work the feasibility of early prediction of hand movement based on sEMG signals to overcome the time delay issue of the conventional classification. Opposed to the classification task, the objective of the early prediction task is to predict a hand movement that is going to occur in the future given the information up to the current time point. The ability of early prediction may allow a hand prosthesis control system to compensate for the time delay and, as a result, improve the usability. Experimental results on the Ninapro database show that we can predict up to 300 ms ahead in the future while the prediction accuracy remains very close to that of the standard classification, i.e. it is just marginally lower. Furthermore, historical data prior the current time window is shown very important to improve performance, not only for the prediction but also the classification task

    Recurrent Neural Network Based Early Prediction of Future Hand Movements

    Get PDF
    This work focuses on a system for hand prostheses that can overcome the delay problem introduced by classical approaches while being reliable. The proposed approach based on a recurrent neural network enables us to incorporate the sequential nature of the surface electromyogram data and the proposed system can be used either for classification or early prediction of hand movements. Especially the latter is a key to a latency free steering of a prosthesis. The experiments conducted on the first three Ninapro databases reveal that the prediction up to 200 ms ahead in the future is possible without a significant drop in accuracy. Furthermore, for classification, our proposed approach outperforms the state of the art classifiers even though we used significantly shorter windows for feature extraction

    Audio Phrases for Audio Event Recognition

    Get PDF
    The bag-of-audio-words approach has been widely used for audio event recognition. In these models, a local feature of an audio signal is matched to a code word according to a learned codebook. The signal is then represented by frequencies of the matched code words on the whole signal. We present in this paper an improved model based on the idea of audio phrases which are sequences of multiple audio words. By using audio phrases, we are able to capture the relationship between the isolated audio words and produce more semantic descriptors. Furthermore, we also propose an efficient approach to learn a compact codebook in a discriminative manner to deal with high-dimensionality of bag-of-audio-phrases representations. Experiments on the Freiburg-106 dataset show that the recognition performance with our proposed bag-of-audio-phrases descriptor outperforms not only the baselines but also the state-of-the-art results on the dataset
    • …
    corecore